12 ◾ Bioinformatics
a Phred quality score to measure the accuracy of each base called. The Phred quality score
(Q-score) transforms the probability of calling a base wrongly into an integer score that is
easy to interpret. The Phred score is defined as
p
Q
=
−
10
/10
(1.3)
Q
p( )
= −10log10
(1.4)
where p is the probability of the base call being wrong as estimated by the caller software.
The Phred quality scores are encoded using ASCII single characters. All ASCII char-
acters have a decimal number associated with them. However, since the first 32 ASCII
characters are non-printable and the integer 33, which is the decimal number for the
exclamation mark ASCII character “!”, the Q=0 is the exclamation mark and the encod-
ing that begins with “!” as zero is called Phred+33 encoding. Illumina 1.8 and later ver-
sions use this Phred+33 encoding (Q33) to encode the base call quality in FASTQ files. The
older Illumina versions (e.g., Solexa) used Phred+64 encoding, in which the character “@”,
whose decimal number is 64, corresponds to Q=0. Table 1.1 shows the Phred quality score
(Q), corresponding probability (P), and the decimal number and ASCII code. For instance,
when the probability of calling a base is 0.1, the Phred score will be 10 (Q=10), but instead
of giving the number 10, that quality score is encoded as the plus sign “+”.
Higher Q scores indicate a smaller probability of error and lower Q scores indicate
low qualities of the base called which is more likely that the base was called wrongly. For
instance, a quality score of 20 indicates the chance of making an error rate (1 error) in 100,
corresponding to 99% call accuracy. In general, the Q-score of 30 is considered a benchmark
TABLE 1.1 Phred Quality Score and ASCII_BASE 33 (Q33)
Q
P
ASCII
Q
p
ASCII
Q
p
ASCII
0
1.00000
33
!
15
0.03162
48
0
30
0.00100
63
?
1
0.79433
34
“
16
0.02512
49
1
31
0.00079
64
@
2
0.63096
35
#
17
0.01995
50
2
32
0.00063
65
A
3
0.50119
36
$
18
0.01585
51
3
33
0.00050
66
B
4
0.39811
37
%
19
0.01259
52
4
34
0.00040
67
C
5
0.31623
38
&
20
0.01000
53
5
35
0.00032
68
D
6
0.25119
39
‘
21
0.00794
54
6
36
0.00025
69
E
7
0.19953
40
(
22
0.00631
55
7
37
0.00020
70
F
8
0.15849
41
)
23
0.00501
56
8
38
0.00016
71
G
9
0.12589
42
*
24
0.00398
57
9
39
0.00013
72
H
10
0.10000
43
+
25
0.00316
58
:
40
0.00010
73
I
11
0.07943
44
,
26
0.00251
59
;
41
0.00008
74
J
12
0.06310
45
-
27
0.00200
60
<
42
0.00006
75
K
13
0.05012
46
.
28
0.00158
61
=
43
0.00005
76
L
14
0.03981
47
/
29
0.00126
62
>
44
0.00004
77
M